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ABSTRACT 


The  correlation  of  radiochemical  data  from  samples  of  fractionated 
nuclear  debris  involves  the  treatment  of  two  variables  whose  uncertainties 
are  comparable.  We  considered  three  new  criteria  for  the  establishment 
of  regression  parameters  for  such  correlations  (least  square  perpendicular 
distances  between  points  and  the  line,  bisection  of  the  angle  formed  by 
the  certain-x  and  certain-y  regression  lines,  and  adoption  of  the  geo¬ 
metric  mean  of  the  certain-x  and  certain-y  regression  slopes).  We  con- 

A 

eluded  that  the  geometric -mean  slope  b  was  most  satisfactory.  It  is 
related  to  the  usual  certain-x  regression  slope  bv  and  the  coefficient 

X;X 

of  correlation  r  by  the  simple  expression 


SUMMARY 


Problem 

!Hie  correlation  of  radiochemical  data  from  fractionated  debris  does 
not  meet  the  usual  requirements  for  the  application  of  least-squares 
analysis;  namely,  that  one  variable  be  known  with  much  greater  certainty 
than  the  other*  Occasionally  the  mechanical  application  of  the  usual 
least-squares  treatment  produces  results  which  appear  to  be  specious. 

Pinaings 

Alternative  treatments  were  developed  and  investigated.  These  are 
based  upon  criteria  which  are  nore  appropriate  to  the  situation  at  hand 
and  also  give  results  which  are  more  reasonable:  least  square  perpen¬ 
dicular  distances  between  points  and  the  line,  bisection  of  the  angle 
formed  by  the  certain-x  and  certain-y  regression  lines,  and  adoption  of 
the  geometric  mean  of  the  certain-x  and  certain-y  regression  slopes.  Of 
these,  the  geometric -mean  slope  was  found  to  be  the  most  satisfactory. 
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INTRODUCTION 


In  applying  standard  least-squares  methods  to  the  statistical  analy¬ 
sis  of  relations  between  two  variables  one  assumes  the  independent  vari¬ 
able  to  be  much  better  known  than  the  dependent  variable  and  then  pro¬ 
ceeds  to  determine  a  regression  line  by  minimizing  the  squared  deviations 
of  the  latter  variable.  There  arise  situations  where  this  assumption  is 
not  at  all  fulfilled.  A  good  example  is  the  correlation  of  radiochemical 
data  from  fractionated  nuclear  debris,  where  dependent  and  independent 
variables  are  of  nearly  equal  uncertainty.  Here  the  regression  slopes 
can  be  heavily  influenced  by  uncertain  data  lying  near  the  population 
extremities.  Situations  frequently  arise  where  the  calculated  line 
differs  significantly  from  what  the  eye  would  select,  leaving  the  viewer 
with  an  uncomfortable  feeling  about  the  reliability  of  the  correlation 
parameters.  An  example  will  be  presented  in  a  later  section. 

Several  obvious  solutions  occur  to  this  state  of  affairs.  One  is 
to  minimize  the  squares  of  the  perpendicular  distances  from  the  regres¬ 
sion  line  instead  of  those  of  the  vertical  distances.  Another  is  to 
use  the  geometric  mean  of  the  slopes  of  the  lines  for  y  on  x  and  for  x 
on  y.  Still  another  is  to  bisect  the  angle  formed  by  these  lines. 

The  purpose  of  this  report  is  to  develop,  test,  and  evaluate  these 
methods  with  a  view  to  applying  the  results  to  the  correlation  of  radio¬ 
chemical  data  from  fractionated  nuclear  debris. 
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NOTATION 


The  notation  below  refers  to  quantities  taken  from  standard  statis¬ 
tical  development.  Additional  notation  will  be  introduced  in  the  text. 


y^  ■  dependent  variable 
xA  *  independent  variable 
n  ■  number  of  data  points 


<U4> 


mean  value  of  u 


i  Eu 
n  i 


S(u,v)  -  n«ui”.>  -  <Uj>  (Vj))  =  S(v,u) 

r  ■  coefficient  of  correlation  «*  3(x,y)/s/ s(x,x)s(y,y) 

«  angle  made  with  v  axis  by  regression  line  obtained  by  assum¬ 


ing  certainty  in  the  u  values  (cf  Pigs.  1  and  2) 


u,v 


tan  0 


u.v 


a 


u.v 


v  intercept  made  by  regression  line  obtained  by  assuming 
certainty  in  the  u  values. 


Figures  1  and  2  illustrate  the  notation  and  some  relations  between 
the  quantities  listed.  A  more  complete  set  of  relations  is  given  in 
Table  1.  Since  these  relations  rre  either  standard  (see,  for  example. 
Ref.  l)  or  immediately  derivable  (from  Figs.  1  and  2),  thei~  derivations 
are  not  belabored. 


DERIVATION  OF  EQUATIONS 

In  this  section  equations  will  be  derived  in  their  most  concise 
form.  An  investigator  desiring  to  apply  the  equations  to  work  completed 
or  in  progress  will  find  these  forms  inconvenient.  Therefore  a  later 
section  will  summarize  the  equations  in  practical  form,  i.e.,  in  terras  of 
the  parameters  most  likely  to  be  available.  Specifically,  these  are 

ax,y’  bx,x’  r'  and  <xi>  ’ 
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Pig.  1  Illustrated  Quantities  and  Relations  for  a  Regression  Line 
Obtained  by  Minimizing  Deviations  in  the  y-direction. 
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Fig*  2  Illustrated  Quantities  and  Relations  for  a  Regression  Line 
Obtained  by  Minimizing  Deviations  in  the  x-direction. 
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TAELE  1 

Summary  of  Basic  Relations 


Quantity 

a 

Expressed  in  Terms  of 

b  S 

bx,x 

.  JLiZ 
ax,x 

1 

b 

x,y 

s(Mi 

S(x,x) 

by»y 

.  ay>* 
ay,y 

1 

by>x 

s(x,y) 

s(y,y) 

2 

a^y  ay>x 

bx,x  by,y 

lS(x,y)]2 

r 

a  a 

x,x  y,y 

S(x,x)S(y,y) 

ax,x 

-  a  /b 
x,y'  x,x 

<V  -  bx,y 

<'.>  wH 

<yt> 

ay>y 

-  a  /b 
y,x'  y,y 

<yi>  -  by,x 

<*.>  iv-a 

(*i> 

0 


Geometric  Mean  Slope 

A 

The  geometric  mean  slope  b  is  given  simply  by 


A  2 

b  -  b  b 

x,x  y,x 


s(y,y)/s(x,x). 


To  complete  the  definition  of  this  line,  it  is  reasonable  to  Impose  the 
condition  that  it  pass  through  the  point  PQ(xo,yo)  formed  by  the  inter¬ 
section  of  the  two  regression  lines 


y  m  a  +  b  x 
*  x,y  x,x 


y  *  ay,y  +  by,x  x’ 
The  coordinates  of  this  point  are 


a  -  a 

x  «  JLtl _ LiZ 

b  -  b 

x,x  y,x 


and 


b  a  -  b  a 

Jjy.  .  y/x 

b  -  b 

x,x  y,x 


Hence,  the  intercept  with  the  y-axis  of  the  line  with  slope  b  and  passing 


through  PQ  is 


A  A 

ay  "  yo  -  bxo 

A  A 

a  (b  -  b)  -  a  (b  -  b) 

*  Y&.  \  xj* _ : _ mJlxj* _ : 

bx,x  ’  by,x 


and  that  with  the  x-axis  is 


ax  =  xo  • a  y0 
b 

A  A 

=  -  a  /b 
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The  Bisector  of  the  Angle  9  -  9 

x  «x  y.x 


angle 


The  line  bisecting  the  angle  between  9  and  9  will  form  the 

XjX  y,x 

A 

S'  =  I  (9  +9  ) 

2  v  x,x  y,x' 


with  the  x-axis.  Hence  we  can  write 

tan  2#  =  tan  (9  +  9  ) 

x>x  y,x 

and  by  familiar  trigonometric  relations  obtain  an  expression  for  the 
slope  b  of  the  bisector: 

~  b  +  b 
2b  x.x  y.x 


~  2  1  -  b  b 

1-b  ^  1  x,x  Dy,x 


from  which 


^  b  b  -  X  //b  b  -  1' 
Z  _  x.x  y.x  ,  f  x.x  y.x 


b  +  b  %Vb  +  b 

x,x  y,x  1  x  x,x  y,x 


-&2L— \ 
+  b  ) 

V  .  Y  / 


where  the  plus  sign  is  chosen  to  make  b  approach  b  and  b  when  these 

X  jX  JT)X 

latter  two  quantities  approach  each  other. 


As  in  the  case  of  the  geometric  mean  slope,  the  line  will  be  made 
to  pass  through  PQ.  Proceeding  as  in  that  case: 


ay  "  yo  "  bxo 


ax  -  xo  -  ~y0  =  -  yb 

For  the  former  intercept 

a  =  a  (b  ^+l)  -  a  (b  ^+l)  -  (a  -a  )  \f{ b  ^+l)  (b  ^+l) 

y  y,yv  x,x  '  x,yv  y,x  1  v  y,y  x,y'  x.x  /v  y.x  ' 


2  2 
b  -  b 
x,x  y,x 


Least  Squared  Perpendicular  Deviations 


The  square  of  the  distance  d,  of  point  P,  (x,  ,y  )  from  a  line 

u  U  1  111 

y  *  a  +  d  is  known  from  analytic  geometry  to  be 
y  x 

di2  =  (yi  -  v  -  K>a/fl8  +  d. 

Summing  over  i,  differentiating  partially  with  respect  to  a  ,  and  setting 

V 

the  resul*  equal  to  zero  give 

<y,>  -  ay  -  b  <Xi)  =  0. 

U 

Carrying  out  a  similar  treatment  with  respect  to  b  gives 

b  (  (y^y  -  <  \)  -  2ay  <yi>  +  ay2)  -  (b2-l)(  <  )  -  ay  <Xl»  * 

Substitution  for  a  and  solution  for  b  gives 


u 

b 


§&¥>  S(x,x)  [ s(y,y)  -  sjx.xrf 

2S(x,y)  -\\  2S(x,y)  / 


+  1 


or 


b  -  I  <by,x  *  bx,y>  +  ft  <by,x  '  bx,y>2  +  1 

V 

where  the  plus  sign  is  chosen  to  make  b  approach  b  and  b  when 

y>x  x,y 

these  latter  two  quantities  approach  each  other. 

The  intercept  with  the  y-axis  is  obtained  by  eliminating  (y^)  from 
the  equation 


and  the  equation 


®y  =  <yi>  -  b  (Xj) 


a  =  <y.)  -  b  <  X.) 

x,y  N*i'  x,x  '  i' 
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to  get 


u  ,  v  v 

a  *  a  +  (b  -  b)  (x.) 
y  x,y  x  x,x  x  V 

Analogously  to  the  previous  cases, 

ax=  <V  -  a  <yt) 


Summary 

The  equations  in  this  section  have  been  derived  in  manners  chosen 
for  directness  and  have  not  always  appeared  in  the  most  desirable  forn. 
To  remedy  this,  Table  2  summarizes  the  results  of  this  section  in  a  way 
which  illustrates  the  similarity  among  the  chosen  methods  and  the  cir¬ 
cumstances  under  which  the  parameters  will  converge.  It  is  convenient 
at  this  point  to  introduce  the  quantity  B,  defined  by  either  of  the 
equations 

b2  -  2Bb  -  1  =  0 

0r  b  =  B  +  Vb2  +  1 

although  we  will  ne’e  have  use  for  it  until  we  discuss  the  application 
of  the  equations. 

For  conversion  of  the  equations  to  practical  form,  it  is  helpful 
to  first  convert  the  ingredients  to  practical  form,  and  this  is  done  in 
Table  3  by  manipulation  of  relationships  in  Table  1.  Application  of 
Table  3  to  previously  developed  equations  gives  Table  4. 

APPLICATION 

Consideration  of  some  of  the  properties  of  the  quantities  we  have 
discussed  will  provide  helpful  orientation. 
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TABLE  2 


Summary  of  Equations  In  Symmetrical  Form 


Quantity 

Treatment 

Geometric  Mean 
Slope 

Least-Square 

Perpendicular 

Deviations 

Slope  of  Bisecting 
Line 

ay 

*o  •  K 

(Y±)  "  to  <xi> 

o 

*  * 
i 

o 

ax 

x«iy» 

(yt> 

1  b  1 

X 

o 

cr*  |h* 

Ba 

b  -  b 

Yix  x,y 

b  -  b 

y^x  x,y 

b  -  b 

y^x  x,y 

2  A/b  b 

V  y,x  x,y 

2 

1  +  b  b 

y,x  x,y 

a.  Defined  by  either  of  the  equations  b2  -  2Bb  -1=  0  or  b  =  B  +  n/b2  +  1. 


?0 
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Summary  of  Equations 


TAHiE  3 

:r  Conversion  to  Practical  Form 


Equation 


c,j/bx,: 


(1-r2)  <x1)-r2^ 
x,x 

a  1-r2  b  ( x . ) 
x,y - 2~  x,x  N  i/ 


a  +  b 
x,y  x 


,x  <xi> 


the  coefficient  of  determination 
ficient  o£  non-determination.  He 
tter  by  k  .  He  calls  k  the  coef- 
bion. 


. " 


We  first  note  that  while  S(u,v)  may  be  either  positive  or  negative > 

S(u,u)  must  always  be  positive.  Reference  to  Table  1  shows  that  S(x,y), 

b  ,  b  >  b  ,  b  and  r  will  therefore  all  have  the  same  sign.  It 
x,x'  y,y,  x,y  y,x 

is  also  obvious  from  Table  1  that  the  value  of  r  will  lie  between  those 


of  b  and  b  , 
x,x  y,y’ 

b  and  b 
x,y  y,x 


while  the  value  of  l/r  will  lie  between  those  of 


Table  2  shows  that  the  sign  of  the  quantity  B  is  the  same  in  all 

three  treatments  and  governed  by  b  -  b  .  Prom  its  definition,  each 

y )  x  x ,  y  2 

value  of  B  is  seen  to  change  sign  as  the  value  of  b  goes  through  1. 
However,  most  correlations  of  fractionation  data  give  values  in  the 
range  of  0  £  b  £  1,  so  that  B  will  lie  primarily  in  the  range  B  £  0. 


2 

Reference  to  Table  4  shows  that,  for  r  =  1  (perfect  correlation), 
all  Bfs  are  equal.  Since  for  positive  correlation  BB/dr  at  r  =  1  is 
the  same  for  both  the  geometric -mean  and  angle -bisection  treatments, 
these  will  have  similar  values  for  good  positive  correlations. 


In  general 

A  C/  ~  p  p 

B  :  B  :  B  =  |r|  :  1  :  2r  /(r  +  1) 

so  that | B I  £  |  b|  S  |b|.  Now,  the  relation  between  b  and  B  is  complicated 
but  for  the  range  of  interest  it  can  be  visualized  geometrically  as  shown 
by  Fig.  3«  Fro®  this  figure  it  is  apparent  that  as  -B  increases,  b  de¬ 
creases,  and  therefore: 

~  A  U  U  A  ~ 

b  *  b  a  b  (0  <  b,  b,  b  s  1) 

Figure  4  shows  the  application  of  these  methods  to  the  correlation 
1^2  q 

of  Te  data  from  Shot  Sedan.*3  Two  outliers  are  evident  among  the  data. 

These  were  included  in  all  the  calculations  except  one,  and  that  one  is 

indicated  on  the  graph.  The  slopes  for  certain-x  (b  )  and  certain-y 

x,x  M 

(b  )  are  seen  to  be  extreme.  The  slope  for  angle  bisection  (b  ■  O.65U) 

y,x 

13 


Pig.  3  Geometrical  Relation  Between  b  and  -B.  The  members  of  the  family 
of  right  triangles  all  have  unit  height  and  a  base  equal  to  -B.  If  a 
distance  -B  is  laid  off  along  a  hypotenuse,  the  length  of  the  remainder 
of  the  hypotenuse  corresponds  to  b.  The  locus  of  these  points  is  shown 
by  the  curved  line. 
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is  slightly  greater  than  that  for  the  mean  slope  (b  -  0.&5),  but  the 
lines  are  indistinguishable  on  the  graph.  The  slopes  of  these  lines 
are  somewhat  larger  than  that  for  the  least  square  perpendiculars. 

RECOMMENDATION 

The  choice  of  a  method  from  the  alternatives  presented  must  be  made 
in  light  of  the  realizations  that:  (l)  ^he  choice  is  not  critical;  (2) 

Cases  with  low  values  of  r  are  of  little  practical  significance;  (3)  Cases 
of  r  «  1,  b  «  1  and  b  «  0  do  not  usually  present  a  problem.  The  considera¬ 
tions  we  have  presented  argue  in  favor  of  the  geometric -mean  regression 
line  (b)  for  the  following  reasons:  (l)  Its  parameters  are  very  simple 
to  calculate  from  quantities  usually  obtained  by  the  conventional  prac¬ 
tice  of  regarding  x  as  certainly  known;  (2)  Since  it  gives  results  which 
are  nearly  equal  to  those  obtained  from  the  angle -bisection  treatment, 
it  has  all  the  advantages  of  that  method;  and  (3)  It  gives  results  for 
the  slope  which  are  intermediate  to  those  obtained  by  rejecting  outliers- 
and  those  obtained  by  least  square  perpendiculars. 

Although  little  experience  has  been  obtained  to  date  on  the  appli¬ 
cation  of  this  method,  no  circumstances  which  would  dictate  another 
choice  are  foreseeable  at  this  time. 

The  similarity  between  the  lines  obtained  by  rejection  of  utliers 
and  the  geometric -mean  line  indicates  that  the  geometric  mean  should 
receive  further  attention  as  a  means  of  handling  the  general  problem 
of  outliers. 
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efficient  of  correlation  r  by  the  simple  expression  * 


bx,x/|rl 
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